90 research outputs found

    Efficient processing of large-scale spatio-temporal data

    Get PDF
    Millionen GerĂ€te, wie z.B. Mobiltelefone, Autos und Umweltsensoren senden ihre Positionen zusammen mit einem Zeitstempel und weiteren Nutzdaten an einen Server zu verschiedenen Analysezwecken. Die Positionsinformationen und ĂŒbertragenen Ereignisinformationen werden als Punkte oder Polygone dargestellt. Eine weitere Art rĂ€umlicher Daten sind Rasterdaten, die zum Beispiel von Kameras und Sensoren produziert werden. Diese großen rĂ€umlich-zeitlichen Datenmengen können nur auf skalierbaren Plattformen wie Hadoop und Apache Spark verarbeitet werden, die jedoch z.B. die Nachbarschaftsinformation nicht ausnutzen können - was die AusfĂŒhrung bestimmter Anfragen praktisch unmöglich macht. Die wiederholten AusfĂŒhrungen der Analyseprogramme wĂ€hrend ihrer Entwicklung und durch verschiedene Nutzer resultieren in langen AusfĂŒhrungszeiten und hohen Kosten fĂŒr gemietete Ressourcen, die durch die Wiederverwendung von Zwischenergebnissen reduziert werden können. Diese Arbeit beschĂ€ftigt sich mit den beiden oben beschriebenen Herausforderungen. Wir prĂ€sentieren zunĂ€chst das STARK Framework fĂŒr die Verarbeitung rĂ€umlich-zeitlicher Vektor- und Rasterdaten in Apache Spark. Wir identifizieren verschiedene Algorithmen fĂŒr Operatoren und analysieren, wie diese von den Eigenschaften der zugrundeliegenden Plattform profitieren können. Weiterhin wird untersucht, wie Indexe in der verteilten und parallelen Umgebung realisiert werden können. Außerdem vergleichen wir Partitionierungsmethoden, die unterschiedlich gut mit ungleichmĂ€ĂŸiger Datenverteilung und der GrĂ¶ĂŸe der Datenmenge umgehen können und prĂ€sentieren einen Ansatz um die auf Operatorebene zu verarbeitende Datenmenge frĂŒhzeitig zu reduzieren. Um die AusfĂŒhrungszeit von Programmen zu verkĂŒrzen, stellen wir einen Ansatz zur transparenten Materialisierung von Zwischenergebnissen vor. Dieser Ansatz benutzt ein Entscheidungsmodell, welches auf den tatsĂ€chlichen Operatorkosten basiert. In der Evaluierung vergleichen wir die verschiedenen Implementierungs- sowie Konfigurationsmöglichkeiten in STARK und identifizieren Szenarien wann Partitionierung und Indexierung eingesetzt werden sollten. Außerdem vergleichen wir STARK mit verwandten Systemen. Im zweiten Teil der Evaluierung zeigen wir, dass die transparente Wiederverwendung der materialisierten Zwischenergebnisse die AusfĂŒhrungszeit der Programme signifikant verringern kann.Millions of location-aware devices, such as mobile phones, cars, and environmental sensors constantly report their positions often in combination with a timestamp to a server for different kinds of analyses. While the location information of the devices and reported events is represented as points and polygons, raster data is another type of spatial data, which is for example produced by cameras and sensors. This Big spatio-temporal Data needs to be processed on scalable platforms, such as Hadoop and Apache Spark, which, however, are unaware of, e.g., spatial neighborhood, what makes them practically impossible to use for this kind of data. The repeated executions of the programs during development and by different users result in long execution times and potentially high costs in rented clusters, which can be reduced by reusing commonly computed intermediate results. Within this thesis, we tackle the two challenges described above. First, we present the STARK framework for processing spatio-temporal vector and raster data on the Apache Spark stack. For operators, we identify several possible algorithms and study how they can benefit from the underlying platform's properties. We further investigate how indexes can be realized in the distributed and parallel architecture of Big Data processing engines and compare methods for data partitioning, which perform differently well with respect to data skew and data set size. Furthermore, an approach to reduce the amount of data to process at operator level is presented. In order to reduce the execution times, we introduce an approach to transparently recycle intermediate results of dataflow programs, based on operator costs. To compute the costs, we instrument the programs with profiling code to gather the execution time and result size of the operators. In the evaluation, we first compare the various implementation and configuration possibilities in STARK and identify scenarios when and how partitioning and indexing should be applied. We further compare STARK to related systems and show that we can achieve significantly better execution times, not only when exploiting existing partitioning information. In the second part of the evaluation, we show that with the transparent cost-based materialization and recycling of intermediate results, the execution times of programs can be reduced significantly

    Efficient spatio-temporal event processing with STARK

    Get PDF
    For Big Data processing, Apache Spark has been widely accepted. However, when dealing with events or any other spatio-temporal data sets, Spark becomes very inefficient as it does not include any spatial or temporal data types and operators. In this paper we demonstrate our STARK project that adds the required data types and operators, such as spatio-temporal filter and join with various predicates to Spark. Additionally, it includes k nearest neighbor search and a density based clustering operator for data analysis tasks as well as spatial partitioning and indexing techniques for efficient processing. During the demo, programs can be created on real world event data sets using STARK's Scala API or our Pig Latin derivative Piglet in a web front end which also visualizes the results

    Big spatial data processing frameworks: feature and performance evaluation: experiments & analyses

    Get PDF
    Nowadays, a vast amount of data is generated and collected every moment and often, this data has a spatial and/or temporal aspect. To analyze the massive data sets, big data platforms like Apache Hadoop MapReduce and Apache Spark emerged and extensions that take the spatial characteristics into account were created for them. In this paper, we analyze and compare existing solutions for spatial data processing on Hadoop and Spark. In our comparison, we investigate their features as well as their performances in a micro benchmark for spatial filter and join queries. Based on the results and our experiences with these frameworks, we outline the requirements for a general spatio-temporal benchmark for Big Spatial Data processing platforms and sketch first solutions to the identified problems

    Putting Pandas in a Box

    Get PDF
    Pandas - the Python Data Analysis Library - is a powerful and widely used framework for data analytics. In this work we present our approach to push down the computational part of Pandas scripts into the DBMS by using a transpiler. In addition to basic data processing operations, our approach also supports access to external data stored in files instead of the DBMS. Moreover, user-defined Python functions are transformed automatically to SQL UDFs executed in the DBMS. The latter allows the integration of complex computational tasks including machine learning. We show the usage of this feature to implement a so-called model join, i.e. applying pre-trained ML models to data in SQL tables

    Precise coupling terms in adiabatic quantum evolution: The generic case

    Full text link
    For multi-level time-dependent quantum systems one can construct superadiabatic representations in which the coupling between separated levels is exponentially small in the adiabatic limit. Based on results from [BeTe1] for special Hamiltonians we explicitly determine the asymptotic behavior of the exponentially small coupling term for generic two-state systems with real-symmetric Hamiltonian. The superadiabatic coupling term takes a universal form and depends only on the location and the strength of the complex singularities of the adiabatic coupling function. As shown in [BeTe1], first order perturbation theory in the superadiabatic representation then allows to describe the time-development of exponentially small adiabatic transitions and thus to rigorously confirm Michael Berry's [Ber] predictions on the universal form of adiabatic transition histories.Comment: 30 pages, 1 figur

    Processing large raster and vector data in apache spark

    Get PDF
    Spatial data processing frameworks in many cases are limited to vector data only. However, an important type of spatial data is raster data which is produced by sensors on satellites but also by high resolution cameras taking pictures of nano structures, such as chips on wafers. Often the raster data sets become large and need to be processed in parallel on a cluster environment. In this paper we demonstrate our STARK framework with its support for raster data and functionality to combine raster and vector data in filter and join operations. To save engineers from the burden of learning a programming language, queries can be formulated in SQL in a web interface. In the demonstration, users can use this web interface to inspect examples of raster data using our extended SQL queries on a Apache Spark cluster

    Interactions of multi-quark states in the chromodielectric model

    Full text link
    We investigate 4-quark (qqqˉqˉqq\bar{q}\bar{q}) systems as well as multi-quark states with a large number of quarks and anti-quarks using the chromodielectric model. In the former type of systems the flux distribution and the corresponding energy of such systems for planar and non-planar geometries are studied. From the comparison to the case of two independent qqˉq\bar{q}-strings we deduce the interaction potential between two strings. We find an attraction between strings and a characteristic string flip if there are two degenerate string combinations between the four particles. The interaction shows no strong Van-der-Waals forces and the long range behavior of the potential is well described by a Yukawa potential, which might be confirmed in future lattice calculations. The multi-quark states develop an inhomogeneous porous structure even for particle densities large compared to nuclear matter constituent quark densities. We present first results of the dependence of the system on the particle density pointing towards a percolation type of transition from a hadronic matter phase to a quark matter phase. The critical energy density is found at Ï”c=1.2GeV/fm3\epsilon_c = 1.2 GeV/fm^3.Comment: 19 pages, 40 eps-figures, RevTex 4, v2: typos correcte

    The time-dependent Born-Oppenheimer approximation

    Get PDF
    We explain why the conventional argument for deriving the time-dependent Born-Oppenheimer approximation is incomplete and review recent mathematical results, which clarify the situation and at the same time provide a systematic scheme for higher order corrections. We also present a new elementary derivation of the correct second-order time-dependent Born-Oppenheimer approximation and discuss as applications the dynamics near a conical intersection of potential surfaces and reactive scattering.Comment: 17 pages, no figure

    Feedback between erosion and active deformation: geomorphic constraints from the frontal Jura fold-and-thrust belt (eastern France)

    Get PDF
    A regional tectono-geomorphic analysis indicates a Pliocene to recent rock uplift of the outermost segment of the Jura fold-and-thrust belt, which spatially coincides with the intra-continental Rhine-Bresse Transfer Zone. Elevated remnants of the partly eroded Middle Pliocene Sundgau-ForĂȘt de Chaux Gravels identified by heavy mineral analyses allow for a paleo-topographic reconstruction that yields minimum regional Latest Pliocene to recent rock uplift rates of 0.05±0.02mm/year. This uplift also affected the Pleistocene evolution of the Ognon and Doubs drainage basins and is interpreted as being tectonically controlled. While the Ognon River was deflected from the uplifted region the Doubs deeply incised into it. Focused incision of the Doubs possibly sustained ongoing deformation along anticlines which were initiated during the Neogene evolution of the thin-skinned Jura fold-and-thrust belt. At present, this erosion-related active deformation is taking place synchronously with thick-skinned tectonics, controlling the inversion of the Rhine-Bresse Transfer Zone. This suggests local decoupling between seismogenic basement faulting and erosion-related deformation of the Mesozoic cover sequence

    Grundlinien der Wirtschaftsentwicklung 2010/2011

    Get PDF
    Das DIW Berlin rechnet fĂŒr 2010 und 2011 mit einem Wirtschaftswachstum von jeweils rund zwei Prozent. Maßgebliche TriebkrĂ€fte kommen von der Binnennachfrage, die - mit Ausnahme der Unternehmensinvestitionen - in großem Umfang durch staatliche Stabilisierungsprogramme sowie durch die automatischen Stabilisatoren gestĂŒtzt wird. Die wichtigste SĂ€ule bildet der private Verbrauch, der von betrĂ€chtlichen KaufkraftzuwĂ€chsen der privaten Haushalte profitiert. FĂŒr die Exporte ist zunĂ€chst noch mit keiner krĂ€ftigen Erholung zu rechnen. Die deutschen Ausfuhren dĂŒrften aufgrund der Spezialisierung auf InvestitionsgĂŒter und des noch relativ geringen Marktanteils in den Wachstumszentren der Weltwirtschaft nur mit Verzögerung - und damit erst im nĂ€chsten Jahr - deutlicher am weltwirtschaftlichen Aufschwung teilhaben. Die Zahl der Arbeitslosen wird im kommenden Jahr zwar die Vier-Millionen-Marke ĂŒbersteigen, angesichts der vorausgegangenen ProduktionseinbrĂŒche fĂ€llt der BeschĂ€ftigungsrĂŒckgang jedoch vergleichsweise schwach aus. Ermöglicht wird dies durch eine schwache ProduktivitĂ€tsentwicklung und eine nur allmĂ€hliche Normalisierung der geleisteten Arbeitszeit. Gleichzeitig bleiben die Preise mit einer Inflationsrate um ein Prozent weitgehend stabil. Voraussetzung hierfĂŒr ist jedoch eine Beruhigung auf den RohstoffmĂ€rkten, die in der Prognose unterstellt ist. Insgesamt sind die RĂŒckschlĂ€ge durch die schwere Wirtschaftskrise indes noch nicht ĂŒberwunden: Erst gegen Ende 2011 dĂŒrfte die Wirtschaftskraft Deutschlands wieder an den Wert von Mitte 2008 und damit an das Niveau vor den dramatischen ProduktionseinbrĂŒchen heranreichen. Das entspricht rein rechnerisch mehr als drei Jahren mit Nullwachstum. In der Geldpolitik stellt sich die Frage nach dem richtigen Zeitpunkt fĂŒr einen Ausstieg aus dem expansiven Kurs. Angesichts der noch bestehenden Unsicherheiten bezĂŒglich der weiteren konjunkturellen Erholung und der Nachhaltigkeit der Finanzmarktstabilisierung ist eine nur allmĂ€hliche RĂŒckfĂŒhrung der ĂŒbermĂ€ĂŸigen LiquiditĂ€tsversorgung empfehlenswert - zumal das PreisstabilitĂ€tsziel derzeit nicht gefĂ€hrdet ist. Die Haushalts- und Finanzpolitik der Bundesregierung ist kritisch zu bewerten: Die Vorhaben der Bundesregierung - Abgabensenkung, Steuerreform, Gesundheitsreform und Einhaltung der Schuldenbremse ab 2016 - mögen fĂŒr sich genommen jeweils eine gewisse BegrĂŒndung haben, als Ganzes betrachtet sind diese Maßnahmen jedoch nicht gleichzeitig realisierbar. Diese WidersprĂŒchlichkeit in der Wirtschaftspolitik kann erheblich zur Verunsicherung der privaten Haushalte und der Unternehmen beitragen. Hier wĂ€ren eine stĂ€rkere PrioritĂ€tensetzung und eine klarere Gesamtkonzeption dringend geboten.Economic outlook, Business cycle forecast
    • 

    corecore